Application of Duplicate Records detection Techniques to Duplicate Payments in a Real Business Environment

نویسنده

  • Hussein Issa
چکیده

Databases are increasing in size at an exponential rate, especially with the business electronization and real-time economy. Processes that were once stored on paper are now stored in databases. Sometimes errors – such as data entry, incomplete information, and unstandardized formats from different data sources – can lead to the existence of more than one representation of the same object in a database. This paper presents the problem of duplicate records and their detection, and addresses the issue of one type of records in specific which is of great interest in the business world: that of duplicate payments. An application of record matching techniques to the database of a telecommunication company is used as an illustration.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Method for Duplicate Detection Using Hierarchical Clustering of Records

Accuracy and validity of data are prerequisites of appropriate operations of any software system. Always there is possibility of occurring errors in data due to human and system faults. One of these errors is existence of duplicate records in data sources. Duplicate records refer to the same real world entity. There must be one of them in a data source, but for some reasons like aggregation of ...

متن کامل

A Comparative Study in Classification Techniques for Unsupervised Record Linkage Model

Problem statement: Record linkage is a technique which is used to detect and match duplicate records which are generated in data integration process. A variety of record linkage algorithms with different steps have been developed in order to detect such duplicate records. To find out whether two records are duplicate or not, supervised and unsupervised classification techniques are utilized in ...

متن کامل

A Study of Progressive Techniques for Efficient Duplicate Detection

---Databases contains very large datasets, where various duplicate records are present. The duplicate records occur when data entries are stored in a uniform manner in the database, resolving the structural heterogeneity problem. Detection of duplicate records are difficult to find and it take more execution time. In this literature survey papers various techniques used to find duplicate record...

متن کامل

A Domain-Independent Data Cleaning Algorithm for Detecting Similar-Duplicates

Data mining algorithms generally assume that data will be clean and consistent. However, in practice, this is not always the case, and for this reason the detection and elimination of duplicate records is an important part of data cleaning. The presence of similar-duplicate records causes over-representation of data. If the database contains different representations of the same data, the resul...

متن کامل

Duplicate Detection of Records in Queries Using Clustering

The problem of detecting and eliminating duplicated data is one of the major problems in the broad area of data cleaning and data quality in data warehouse. Many times, the same logical real world entity may have multiple representations in the data warehouse. Duplicate elimination is hard because it is caused by several types of errors like typographical errors, and different representations o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010